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O ' Abstract 

<n: 

We study the well-known Label Cover problem under the additional requirement that prob- 
lem instances have large girth. We show that if the girth is some fc, the problem is roughly 
2 log ' n / k hard to approximate for all constant e > 0. A similar theorem was claimed by Elkin 
and Peleg [ICALP 2000], but their proof was later found to have a fundamental error. We use 
the new proof to show inapproximability for the basic fc-spanner problem, which is both the 

j— , ' simplest problem in graph spanners and one of the few for which super-logarithmic hardness was 

not known. Assuming NP % BPTLME^ ^ 10 ^))^ we show that for every fc > 3 and every 

(\ ' constant e > it is hard to approximate the basic fc-spanner problem within a factor better 

than 2( log £ ™)/ fc (for large enough n). A similar hardness for basic fc-spanner was claimed by 
Elkin and Peleg [ICALP 2000] , but the error in their analysis of Label Cover made this proof 
fail as well. Thus for the problem of Label Cover with large girth we give the first non-trivial 

fvi ■ lower bound. For the basic fc-spanner problem we improve the previous best lower bound of 

£> , £!(logn)/fc from J24|. Our main technique is subsampling the edges of 2-query PCPs, which 

allows us to reduce the degree of a PCP to be essentially equal to the soundness desired. This 

3: ^^^^„^^, 

O. 

cn ! 1 Introduction 

°: 

In this paper we deal with 2-query probabilistically checkable proofs (PCPs) and variants of the 
Label Cover problem. Label Cover was originally defined by Arora and Lund in their early survey 
on hardness of approximation ||. Since then, Label Cover has been widely used as a starting point 
when proving hardness of approximation, as it corresponds naturally to 2-query probabilistically 
checkable proofs and one-round two-prover interactive proof systems. Notable examples are the 



reduction to the Set Cover problem [27, 17], the reduction to the maximum independent set problem 



[27, 21] and the reduction to the minimum coloring problem [18|. Certain reductions from Label 
Cover, though, require special properties of the Label Cover instances. So then the question 
becomes: is Label Cover still hard even when restricted to these instances? For example, the 
famous Unique Games Conjecture of Khot |2J| can be thought of as a conjecture that Label Cover 
is still difficult to approximate when the relation on each edge is required to be a bijection. A 
different type of requirement is on the structure of the Label Cover graph rather than on the 
allowed relations; Elkin and Peleg jl^] showed that if Label Cover (actually, a slight variant known 
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as Min-Rep) is hard even on graphs with large girth then the basic k- spanner problem is also 
hard to approximate. They then gave a proof that Label Cover is indeed hard to approximate on 
large-girth graphs, but unfortunately this proof was later found to have a flaw (as Elkin and Peleg 
point out in [16, Section 1.3]). In this paper we give a completely new proof that Label Cover and 



Min-Rep are hard to approximate on large- girth graphs. Our argument is based on subsampling 
edges of a 2-query PCP/Label Cover instance. Subsampling of 2-query PCPs and Label Cover 



instances has been done before for other reasons (see e.g. [20]), but we show that the sampling 
probability can be set low enough to destroy most small cycles while still preserving hardness, and 
this technique was not previously used in this context. Remaining cycles can then be removed 
deterministically. 

1.1 Label Cover and Probabilistically Checkable Proofs 

A probabilistically checkable proof (PCP) is a string (proof) together with a verifier algorithm that 
checks the proof (probabilistically). There are several important parameters of a PCP, including 
the following: 

1. Completeness (c): the minimum probability that the verifier accepts a correct proof. All of 
the PCPs in this paper have completeness 1. 

2. Soundness (or error) (s): the maximum probability that the verifier accepts a proof of an 
incorrect claim. 

3. Queries: the number of queries that the verifier makes to the proof. In this paper we will 
study the case when the verifier only makes 2 queries. 

4. Size: the number of positions in the proof (i.e. the length). 

5. Alphabet (£): the set of symbols that the proof is written in. We will only be concerned 
with PCPs for which |E| is at most polynomial in the size of the PCP, so we will assume this 
throughout. 

For this paper we will be concerned with 2-query PCPs, which are the special case when the 
verifier is only allowed to query two positions of the proof. We will also assume (without loss 
of generality) that these two queries are to different parts of the proof, i.e. there is some set of 
positions A that can be read by the first query and some other set of positions B that can be read 
by the second query with A n B = 0. 

For this type of PCP, there are two natural graphs that represent it. The first (and simpler), 
which is sometimes called the supergraph, is a bipartite graph (A, B, E) in which there is a vertex 
for every position of the proof and an edge between two positions if there is a possibility that 
the verifier might query those two positions. By our assumption, this graph is bipartite. We also 
assume that the verifier chooses its query uniformly at random from these edges, which is in fact 
the case in the PCPs that we will use (in particular in the PCP for Max-3SAT(5) obtained by 
parallel repetition). Vertices of this graph will sometimes be called supervertices, and edges will be 
superedges 

The second graph can be thought of as an expansion of the supergraph in which the test the 
verifier does is explicitly contained in the graph. This graph is also bipartite, with vertex set 
(A x T,a,B x Tib), where S^ is the alphabet used in A positions of the proof and similarly for 



Tib- There is an edge between vertices (a, a) and (b, (3) if the verifier might query a and b together 
(i.e. (a, b) is an edge in the supergraph) and if, upon such queries, the verifier will accept the proof 
if it sees values a in position a and /3 in position b. We call this graph the Min-Rep graph to 
distinguish it from the supergraph. This is related to the work in [p4|]. 

There is a natural correspondence between these types of PCPs and the optimization problem 
of Label Cover. In Label Cover we are given a bipartite graph G = (A,B,E), alphabets T*a 
and £,b, and for every edge e £ E a nonempty relation 7r e C T,a x £#• The goal is to find 
assignments ja : A — > Y>a and jb ■ B —> £# in order to maximize the number of edges e = (a, b) 
for which (7^4(0), 'Jb (b)) £ 7r e (in which case we say that the edge is satisfied or covered). It is 
easy to see that the existence of PCPs for NP-hard problems implies that Label Cover is hard to 
approximate: in particular, if we use the supergraph and set the relation to be answers on which 
the verifier accepts, then it is hard to distinguish instances in which at least a c fraction of the 
edges are satisfiable (a valid proof) from instances in which at most an s fraction of the edges 
are satisfiable (an invalid proof). The exact nature of the hardness assumption is based on the 
size of the PCP: if it has size m(n) (where n is the size of the original problem instance) then 
the hardness assumption is that NP is not contained in DTIME (poly(rn(n))) (for deterministic 
PCP constructions and approximation algorithms) or BPTIME(po/y(m(n))) (for randomized PCP 
constructions or approximation algorithms). 

We now describe a slight variant of Label Cover known as Min-Rep (originally defined by Kort- 
sarz |2J]) that has been useful for proving hardness of approximation for network design problems 
such as spanners. It can be thought of as a minimization version of Label Cover with the additional 
property that the alphabets are represented explicitly as vertices in a graph. Consider the Min-Rep 
graph G' = (A x "Sa, B x £b, E'). For every i £ A let A% = {(i, a) £ Ax S^} be the set of vertices 
in the Min-Rep graph corresponding to vertex i in the Label Cover graph, and similarly for i £ B 
let Bi = {(i,/3) £ B x £#}. Our goal is to choose a set S of vertices of G' of minimum size so 
that for every (i,j) £ E there are vertices (i, a) and (J, /3) in S that are joined by an edge in E' . 
Such a set is called a REP-cover, and the vertices in it are called representatives. Less formally, 
we can think of Min-Rep as being the problem of assigning to every vertex in the supergraph a 
set of labels/representatives (unlike Label Cover, in which only a single label is assigned) so that 
for every superedge (a, b) there is at least one label assigned to a and at least one label assigned 
to b that satisfy the relation ^( a ,b)- Note that in the Min-Rep graph the number of vertices is 
\A\ ■ \Tia\ + \B\ ■ \Tib\, which means that the size of a Min-Rep instance might be larger than the 
size of the associated Label Cover instance. The supergirth of a Min-Rep graph is just the girth of 
the supergraph, i.e. the girth of the associated Label Cover instance. 

Two parameters of PCPs/Label Cover that will be important for us are the degree and the 
girth. The degree of a PCP is the maximum degree of a vertex in the supergraph / associated 
Label Cover instance. Similarly, the girth of a PCP is the girth in the supergraph / associated 
Label Cover instance (recall that the girth of a graph is the length of the smallest cycle). 

1.2 The basic /c-spanner problem and previous work 

The basic k-spanner problem, also called the minimum size k-spanner problem, is the second main 
subject in this paper. In this problem we are given an undirected, unweighted graph G and are 
asked to find the subgraph G' = (V, E') with the minimum number of edges with the property that 



distG'(u,v) . 

— < k, lor every two vertices u, v G I/. (I) 

In the above, the distance between two vertices is just the number of edges in the shortest path 
between the two vertices, and distn is the distance function on a graph H. Any subgraph G' 
satisfying ([]]) is called a k-spanner of G, and our goal is to find the /c-spanner with the fewest 
edges. 



Elkin and Peleg [15] proved that if Min-Rep is hard to approximate when restricted to instances 
with large supergirth, then the basic /c-spanner problem is also hard to approximate (the required 
supergirth of the Min-Rep instances depends on the value k). For completeness, we shall later 
describe their reduction and proof. 

We discuss some previous work on spanner problems. The concept of graph spanners was first 



invented first by [26] in a geometric context. To the best of our knowledge the spanner problem 



on general graphs was first invented indirectly by Peleg and Upfal |3C|] in their work on small 
routing tables. This problem was first explicitly defined in p9|, albeit several papers claim that 



this problem was invented in [28] (the two journal versions are from the same year). 

Spanners appear in remarkably many applications. Peleg and Upfal |3(]] showed an application 
of spanners to maintaining small routing tables. For further applications toward this subject 
see 1 11, ^, 33, 34]. In |2^] a relation between sparse spanners and synchronizers for distributed 
networks was found. In [||, [jl], |14|, |0J] applications of spanners to parallel, distributed, and streaming 
algorithms for shortest paths are described. For applications of spanners to distance oracles see 
[||, 35]. For applications of spanners in property testing and related subjects see |7|, || ^, |3l[ . 



For completeness, we note that the basic /c-spanner problem has been solved for the special case 
of k = 2: there is an O(logn) approximation 25], and that is the best possible [24|. 



1.3 Our results 

All of our results hold for large n, so throughout this paper we will assume that n is sufficiently 
large. Our first result is on the hardness of approximating Label Cover with large girth: 

Theorem 1.1. Assuming NP % BPTIME(2 polyl ° 9 ^), for any constant e > and for 3 < k < 
log _ e n there is no polynomial-time algorithm that approximates Label Cover with girth greater 
than k to a factor better than 2( 1 °g 1 ~ e?1 )/ fc . 

We also show how to adapt this hardness from Label Cover to Min-Rep, which then gives us 
strong hardness for basic /c-spanner. 

Theorem 1.2. Assuming NP % BPTIME(2 pol y lo ^ n )), for any constant e > and for 3 < k < 
log _ € n there is no polynomial-time algorithm that approximates Min-Rep with supergirth greater 
than k to a factor better than 2( log " n )/ k . 

Theorem 1.3. Assuming NP % BPTIME(2 pol y lo ^ n )), for any constant e > and for 3 < k < 
log _ £ n there is no polynomial-time algorithm that approximates the basic k-spanner problem to a 
factor better than 2( 1 °g 1 ~ E ")/ fc . 

This is the same hardness for basic /c-spanner as was claimed by [[ij| , and so (as they note) is tight 
at least for the case oik = log'* n where \i is any constant in (0, 1). This is because the best known 
approximation algorithm for basic /c-spanner (l| gives a ratio of essentially n v-' k > = 2®( log n \ 
while Theorem 1.3 implies hardness of 2 log n for arbitrarily small constant < e < (1 — n)/2. 



1.4 The error in Hl5 



To the best of our knowledge the question answered in Theorem 11.21 regarding the hardness of Min 



Rep with large supergirth was first presented in ICALP 2000 by Elkin and Peleg p^| . In [15] the 
authors tried to create Min-Rep instances with large supergirth that are also hard to approximate 
as follows. They started with a 3-Sat(5) graph with large supergirth, which is relatively easy to 
obtain by standard transformations of 3CNF formulae; the supergirth is with respect to the graph 
that has the clauses and variables as vertices, with an edge between a clause and a variable if the 



variable is in the clause. They then applied the parallel repetition theorem [32 1 and claimed to 



boost the hardness while maintaining the large supergirth. This reduction is incorrect (as Elkin 



and Peleg acknowledge in [16]), as non-simple cycles such as paths in the original graph become 



simple cycles after parallel repetition is applied. In fact the supergirth in the construction of J15J is 



4, no matter what the initial supergirth before parallel repetition is, and thus [15] does not imply 
any hardness whatsoever for the large supergirth Min-Rep problem. For the interested reader, in 
the conference version of |lj| it is Lemma 13 which is incorrect. 

Regarding the hardness of basic /j-spanner, in [^] a reduction is given from Min-Rep with 
supergirth larger than k + 1 to the basic /c-spanner problem for k > 3. While this reduction is 
correct, since the hardness proof for large supergirth Min-Rep in [15] is incorrect the reduction does 
not actually imply any hardness for basic A;-spanner. 

The actual situation before our paper is as follows. No hardness whatsoever was known for the 
Min-Rep with large supergirth problem; our hardness result comes 12 years after this question was 
first posed. Regarding lower bounds for the basic A;-spanner problem, this question was first raised 
in p4| in APPROX 1998. The best lower bound known (before our paper) was Q(logn)/k. The 
improved hardness we give comes 14 years after this question was first posed. 

1.5 Some remarks on our techniques 

The approach taken by Elkin and Peleg was to first create an instance with large supergirth, and 
then apply parallel repetition to boost the gap. Unfortunately, parallel repetition destroys the 
supergirth, bringing it back down to 4. Our idea is to apply parallel repetition first, boosting 
the gap, and then randomly sample superedges to sparsify the supergraph. It turns out, perhaps 
surprisingly, that to a certain extent these random choices do not decrease the gap. This may seem 
non-intuitive at first as usually the gap is closely related to superdegree and a smaller superdegree 
would imply a smaller gap. This may have been one of the obstacles in finding a lower bound for 
Min-Rep with large supergirth. However, it turns out that it is possible to keep the gap despite 
the smaller superdegree. 

The hardness that we derive this way is actually for the Label Cover problem with large girth and 
not for the Min-Rep problem with large supergirth. The standard reduction from Label Cover to 



Min-Rep [24] entails duplications of many super vertices. This is needed in order to ensure regularity 



in the Min-Rep graph, which is used to ensure that removing a /j, fraction of the supervertices 



will imply a removal of at most a \x fraction of the superedges. In [24] this duplication is done 
after the parallel repetition step, as the supergirth was not an important quantity. However, such 
duplications add many cycles of length 4 in the supergraph. We handle this difficulty by performing 
duplication before we apply parallel repetition. It is clear that parallel repetition applied to a regular 
graph maintains the regularity, which is the property that we need to go from the maximization 
objective of Label Cover to the minimization objective of Min-Rep. 



2 Sampling Lemma for 2-query PCPs 

We begin with our general 2-query PCP sampling lemma. We remark that similar subsampling 
techniques have been used before (notably by Goldreich and Sudan [20] to give almost-linear size 
PCPs), but we specialize the technique with an eye towards giving a tradeoff between the soundness 
and the girth. Since we will only be concerned with regular PCPs, we will phrase it for the 
special case when the supergraph has \A\ = \B\ = n/2 and is regular with degree d. We will 
assume without loss of generality that |E^| > |£b|. Given such a PCP verifier (i.e. Label Cover 
instance) G = {A, B, E), let G a be the verifier/instance that we get by sub-sampling the edges with 
probability p a = - og J A ' , i.e. every edge e 6 E is included in G a independently with probability 

Pa- 

Lemma 2.1. Let G = (A,B,E) be a 2-query PCP verifier/Label Cover instance with completeness 
1 and soundness s in which \A\ = \B\ = n/2, every vertex has degree d, and |Xyi| > |Ss|. Let 
1 < a < 1/s. Then with high probability G a is a PCP verifier with completeness 1 and soundness 
at most Ae/a. 

Proof. It is obvious that G a has completeness 1 with probability 1, since any valid labeling/proof 
of G is also valid for G a . To bound the soundness, first fix a proof / labeling. We know that in 
the original verifier, at most an s fraction of the edges are satisfied. After sampling, the expected 
number of satisfied edges is at most 

|.E|log|IU| n 
P a s\E\ < = -log|IU|. 



Since the sampling decisions are independent we can apply a Chernoff bound (see e.g. [13, Theo- 
rem 1.1]), giving us that the probability that more than en log [£^1 edges are satisfied is at most 
2 -enio g |s A | = |s A |- e ™. But the total number of possible proofs is at most |SA| n/2 |S B | n / 2 < |S A | n . 
So by a union bound, the probability that any labeling satisfies more than en log |E^| edges is at 
most [S^l - ' 6 " 1 ^ < 2~ n , which is negligible. But the expected total number of edges after sam- 
pling is Pal-El = §a log |Sa| , and so another Chernoff bound implies that with high probability the 
number of edges after sampling is at least (n/4)alog |£a|- Thus with high probability no proof is 
accepted with probability more than (en log |S J 4|)/((n/4)alog |£U|) = 4e/a. □ 



Lemma |2.1| shows that we can sample edges so that the average degree is about a log |£| without 
hurting the soundness too much (in particular, the soundness becomes basically 1/a). Note that 
if we set a = 1/s this allows us to reduce the average degree to basically (1/s) log |XU| (a possibly 
significantly reduction) without affecting the soundness by more than a constant. We would like 
to claim that this lets us increase the girth, but at this point we will merely prove that any edge is 
unlikely to be in short cycles. Later we will deterministically remove edges that take part in short 
cycles, but since that might destroy approximate-regularity (which is necessary for our reduction 
to Min-Rep) we put it off until later. 

Lemma 2.2. Fix an edge (u,v) £ G. Conditioned on (u,v) £ G a , the probability that (u,v) is in 
a cycle in G a of length at most k is at most ^ • 

Proof. Let 4 < k! < k (note that no edge is in a cycle of length less than 4 in any bipartite 
graph, including G). Fix a cycle of length k! in G that contains (u,v). After conditioning on 

6 



(u, v) surviving the sampling, the probability that all of the other edges in the cycle are also in 

G a is j>„ -1 = I ~ ° g J ) • On the other hand, we know from the degree bound in G that 

the number of fc'-cycles containing (u,v) is at most d . So a union bound implies that the 

probability that (u,v) is in a fc'-cycle in G a is at most ^ ■ Now we take a union bound 

over all 4 < k' < k to get that the total probability that (u, v) is in a cycle of length at most k is 

at most X]fc'=4 d — ° g d — as claimed (assuming without loss of generality that 

alog|S A |>2). D 



It is easy to see that subsampling preserves approximate regularity, but we will now prove so 
formally. 

Lemma 2.3. If a> 16 log n then with probability at least 1 — 2/n the degree of every vertex in G a 
is between ^adoglE^I and 2alog|E^|. 

Proof. Since G is regular with degree d and p a = a og J Al , the expected degree of a vertex v in 



G a is clearly alog|S^|. So by a Chernoff bound (see e.g. [13, Theorem 1.1]), the probability that 
the degree is less than 1/2 of this or more than twice this is at most 2 • e~ a g ' s -4l/ 8 < 2/|£a| _Q! • 
Since a > 16 log n and \T,a > 2, this probability is at most 2/n 2 , so taking a union bound over 
vertices implies that the probability that all of them have degree in the desired range is at least 
1 - 2/n. □ 

3 Label Cover and Min-Rep with large (super)girth 

In this section we show that Label Cover and Min-Rep are both hard to approximate, even when 
restricted to instances with large (super)girth. We start with a PCP verifier with supergraph G 
and Min-Rep graph H, and then use the previously described random sampling technique to get 
a new supergraph G a and Min-Rep graph H a . We now remove from G a any edge that is in a 
cycle of length at most k, giving us a new supergraph G' a and Min-Rep graph H' a (where an edge 
((a, 5), (b, j3)) from H is in H' a if (a, b) remains as an edge in G' a ). These instances will form our 
reduction. 

We say that an edge (i, j) £ G a is bad if it is not in G' a , i.e. it is part of a cycle of length of at 
most k in G a . 



Lemma 3.1. Let 161ogn < a < min{l/s, <i 1 ' fc /log (S^l}. Then with probability larger than 2/3 

d 



the number of bad edges is at most O ( — °^i A '' I < 0{n) 



Proof. Lemma |2.2| and Markov's inequality imply that with probability at least 3/4, at most a 

° g d — fraction of the edges are bad. With high probability the total number of edges in G a 

is Q(na log |Syi|), so this means that the number of bad edges is at most O f ° s d ) . By our 

choice of a, this is at most 0(n). □ 

Theorem 3.1. If there is no (randomized) polynomial time algorithm that can distinguish between 
instances of Label Cover in which \A\ = \B\ = n/2 and all vertices have degree d where all edges 
are satisfiable and instances where at most an s fraction of the edges are satisfiable, then there 
is some constant c so that for 18 log n < a < min{l/s,(i 1 ' fc /log |S^|} there is no (randomized) 

7 



polynomial time algorithm that can distinguish between instances of Label Cover with girth larger 
than k in which all edges are satisfiable and instances in which at most a c/a fraction of the edges 
are satisfiable. 

Proof. If there is a labeling that satisfies all edges of G, then clearly the same labeling satisfies all 
edges of G' a . On the other hand, suppose that only an s fraction of the edges of G can be satisfied. 



By Lemma 3T, the number of bad edges is at most 0(n), so the total number of edges in G' a is 
still e(nalog|£ A |). 

Fix any labeling of G' a , and suppose that it satisfies a /3 fraction of the edges of G' a . Then even 
if it would have satisfied all of the bad edges, the number of edges of G a that it satisfies is at most 



/3 ■ 0(nalog |£U|) + 0(n). By Lemma 2J. this must be at most (4e/a) • G(nalog |S A |), and thus 
for some constant c we have that f3 < c/a. 

Thus if we could distinguish between the case when every edge of G' a can be satisfied and the 
case when at most a c/a fraction can be satisfied, we could distinguish between the case when every 
edge of G can be satisfied and the case when at most an s fraction can be satisfied. □ 

We now reduce to Min-Rep. We first show how the size of the minimum REP-cover of H a 
depends on G. We will then show that, similar to Label Cover, we can deterministically remove 
small cycles to get the instance H' a with large supergirth that preserves this gap. 

Lemma 3.2. Let 18 log n < a < 1/s. If there is a valid labeling of G then the Min-Rep instance 
H a has a REP-cover of size n (where n is the number of vertices in the supergraph). Otherwise, 
with high probability the smallest REP-cover has size at least Q,(n\fa). 



Proof. If there is a valid labeling of G then by Lemma 2.1 there is a valid labeling of G a (since 
the completeness remains 1), and thus there is a REP-cover of H a of size n as claimed. On the 
other hand, suppose that at most an s fraction of the edges of G can be satisfied. Then since the 



soundness of G a is at most Ae/a by Lemma 2.1 , any labeling of G a satisfies at most a Ae/a fraction 
of the edges. Suppose that there is a REP-cover of H a of size less than n-^/a/ (36\/3e). We will 
show that this implies that there is a labeling of G a that satisfies more than a Ae/a fraction of the 
edges, giving a contradiction and proving the lemma. 

Suppose that the smallest REP-cover for H a has size f3n. This means that the average number 
of representatives/labels assigned to each vertex in G a by this cover is /3. To analyze the labeling 
that covers the most edges, we analyze the random labeling obtained by choosing for each vertex 
a label uniformly at random from the set of labels it is assigned by the REP-cover. Let A' C A 
be the set of vertices in A that receive at most 18/3 labels in this REP-cover, and define B' C B 
analogously. Note that \A'\ > (8/9)|.A| and similarly \B'\ > (8/9)|2?|, since otherwise the total 
number of representatives in the REP-cover is larger than (1/9) • (n/2) ■ (18/3) = f3n, contradicting 
our assumption on the size of the REP-cover. With high probability the fraction of edges of G a 
that touch a vertex of A \ A' is at most (2d(i/9)+(8/9)(d/2) = V^' an d similarly for the fraction of 
edges that touch B \ B' (where we used the approximate regularity from Lemma ^3). So if we 
consider the subgraph of G a induced by A' U B' we still have at least 1/3 of the edges of G Q . 

Now let (a, b) £ A' x B' be such an edge. Since we started with a REP-cover, there is at least 
one representative assigned to a and one representatives assigned to b that satisfy the relation ft( a ,b)- 
Since both endpoints have at most 18/3 representatives in the REP-cover, the probability that these 
two representatives are the assigned labels is at least l/(18/3) 2 . Thus by linearity of expectations 
we expect that at least l/(3(18/3) 2 ) = l/(972/3 2 ) fraction of the edges are covered by our random 



labeling, so there exists some labeling that covers at least that many. If /3 < „^%- then this is at 



least 4e/a, giving a contradiction. Thus the smallest REP-cover has size at least (ny / a)/(36v3e), 
proving the lemma. □ 

We will now get rid of small cycles by using the instance H' a , proving hardness of Min-Rep with 
large supergirth. 

Theorem 3.2. If there is no (randomized) polynomial time algorithm that can distinguish between 
instances of Label Cover in which \A\ = \B\ = n/2 and all vertices have degree d where all edges are 
satisfiable and instances where at most an s fraction of the edges are satisfiable, then there is some 
constant c so that for 18 log n < a < minjl/s^ 1 ' /loglS^)} there is no (randomized) polynomial 
time algorithm that can distinguish between instances of Min-Rep with supergirth larger than k 
where the smallest REP-cover has size at most n and instances where the smallest REP-cover has 
size at least n^fajc (here n is the size of the supergraph). 

Proof. If there is a labeling that satisfies all edges of G, then clearly choosing that labeling gives 
a valid REP-cover of H' a of size at most n. For the other case, suppose that any labeling of 
G satisfies at most an s fraction of the edges. Then by Lemm a |3.2| , with high probability the 
smallest REP-cover of H a has size at least Q(n^/a). By Lemma |3.l| , the number of bad edges is 
at most 0{n). Removing any particular edge (in particular a bad edge) can only decrease the size 
of the optimal REP-cover by at most 2, so if we remove all bad edges (getting H' a ) we are left 
with an instance with supergirth larger than k in which the smallest REP-cover has size at least 
£l(n^/a) — 0{n) = Vt{nyfa). By construction the supergirth is greater than k, so this proves the 
theorem. □ 



Now we define and analyze the PCP / Label Cover instances to which we will apply Theorems 3.1 
and |3.2| , Recall that Max-3SAT(5) is the problem of finding an assignment to variables of a 3-CNF 
formula that maximizes the number of satisfied clauses, with the additional property that every 
variable appears in exactly 5 clauses of the formula. We begin with the standard Label Cover 
instance for Max-3SAT(5) (see for example ||). The graph (A,B,E) has \B\ = n' and |^4| = 5ra'/3 
(where n' is the number of variables in the instance), and every vertex in A has degree 3 and every 
vertex in B has degree 5. Vertices in A correspond to clauses and vertices in B correspond to 
variables. The alphabet sizes are |S^| = 7 and |Sb| = 2. The PCP Theorem 0, |j implies that 
the gap for these instances is constant, i.e. it is hard to distinguish the case when all edges are 
satisfiable from the case in which 1/(1 + e) fraction of the edges are satisfiable, for some constant e. 

Now we take three copies of A, call them Ai,A2,A^, and let A' be their union (so \A'\ = 5n'). 
Similarly we take five copies of B to get Bi for i £ [5], and take their union to be B' . Now between 
each Ai and each Bj we put a copy of the original edge set E (which we will call E{j), giving us 
edge set E' . Note that \B'\ = \A'\ = 5n' and every vertex has degree 15. Obviously if the original 
instance has all edges satisfiable then that is still true of this instance. On the other hand, suppose 
in the original instance at most 1/(1 + e) of the edges are satisfiable. Then fix any labeling of A' 
and B'. This induces some labeling of Ai and Bj, which we know can only satisfy 1/(1 + e) of the 
edges in Eij. Since this is true for all i, j, the total fraction of satisfied edges is at most 1/(1 + e). 

We now apply parallel repetition £ times. Now each side has size (5n') , the degree is d = 15 , 
and the alphabet sizes are \Tia\ = 7^ and \T,b\ = 2^. By the parallel repetition theorem [32|, unless 
NP C BPTIME{n . ' ■>) it is hard to distinguish between the case when all edges are satisfiable 



and when at most a 2 £ ' c fraction are satisfiable for some constant c. We can apply Theorem 3.1 
to this construction to get the following hardness result. 

Theorem 3.3. Assuming NP % BPTIME(2 polyl ° 9 ^), for any constant e > and 3 < k < 
log ~ e n there is no polynomial time algorithm that can approximate Label Cover with girth greater 
than k to a factor better than 2( log £ "0/ fc . 

Proof. Set £ = log ' e n', so the size of the Label Cover instance is n = (5n') log n and £ e = logn'. 
Note that logn = @(£logn') = Q(£ 1+t ), so £ = 6((logn) 1 /(i+^)). Let a = min{2^ c , 15^ fc /^log7}. 
Assuming that k is at most log' '' +e >'" / n for some constant 7 > implies that a > 16 logn, so 
applying Theorem 3.1 to this construction implies that, assuming NP ^ B PT I M E(n°^>) , there 



is no polynomial time algorithm that can distinguish between instances of Label Cover with girth 
greater than k in which all edges are satisfiable and instances in which at most a c/a fraction are 
satisfiable (for some constant c). Using a smaller e to change 1/(1 + e) to 1 — e, as well as to get 
rid of lower order terms, gives the theorem. □ 



On the other hand, if we apply Theorem 3^ to this construction then we get the following 
theorem: 

Theorem 3.4. Assuming NP % BPTIME(2 polyl ° 9 ^), for any constant e > and 3 < k < 
log ~ e n there is no polynomial time algorithm that can distinguish between instances of Min-Rep 
with supergirth greater than k that have a REP- cover of size at most h and instances in which the 
smallest REP-cover has size at least 2( log n >' k ■ n, where n is the size of the Min-Rep graph and 
n is the size of the supergraph. Thus there is no polynomial time algorithm that can approximate 
Min-Rep with supergirth greater than k to a factor better than 2( log n >' k . 

Proof. As before, we set £ = log 1/e ri (so £ e = logn'). Then n = (5n') £ • 7 e + (5n') £ ■ 2 £ < 2(35n') £ 
is the number of vertices in the resulting Min-Rep instance. Note that, as in Label Cover, 
logn = 0(£logn') = Q(£ 1+t ), so £ = 0((logn) 1 /( 1+e ^). Applying Theorem |3.2| to this construc- 
tion implies that unless NP C BPTIME(2 polylo9 ^ n >), there is no polynomial time algorithm that 
can distinguish between instances of Min-Rep with supergirth larger than k where the smallest 
REP-cover has size at most 2(5n'Y and instances where the smallest REP-cover has size at least 
O ( {bn'Y \/m.m.{2 £ / c 1 \b l / k /£\og 7} J (assuming k < \og^ 1 '^ 1+e ''~' y n for some constant 7 > 0). Plug- 
ging in our choice of £, and using smaller values of e to get rid of lower order terms and replace 
1/(1 + e) by 1 — e, gives the theorem. □ 

4 Hardness of basic /c-spanner 

We now show how to reduce Min-Rep with large supergirth to the basic A>spanner problem. This 
reduction is from Elkin and Peleg fig] , which is in turn very similar to reductions from Min- 
Rep to other spanner problems developed by Elkin and Peleg |l6|]. We include it here just for 
completeness, and try to use their notation when possible. Suppose that we are given a Min-Rep 
instance with supergraph G = (A,B,E) with supergirth at least k + 2, as well as the Min-Rep 
graph G = (A x T,a,B x I] b ,E). As before, for i G A let Ai = {(i,a) : a G XU} be the set of 
vertices in the Min-Rep graph corresponding to the vertex i in the supergraph, and similarly for 
i £ B let Bi = {(i,/3) : /? G £_b}. Let n = \A\ ■ \T*a\ + |-B| • |Ss| denote the size of the Min-Rep 
graph, and let n = \A\ + \B\ denote the size of the supergraph. Since this instance comes from our 
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previous reduction, we can also assume that \A\ = \B\ = h/2. Let k A = L — 2 — J ' ^ ^B = f — g — ~l ' 
and let x = n 2 /h. To define the &:-spanner graph G' , we first define two new vertex sets: 

S = {s p zj : i G A,j G [k A ],p G [x]} and T = {t p tJ :ieB,je [k B ],p G [x]}. 

The vertex set of our graph G' will be V = A U B U S* U T. Now we define a collection of edge 
sets: 

£m = {(s P j, s p i(j+1) ) :pe[x],i€A,j G[k A - 1]}, 

u {(^Au+i)) ■■ P G M>* e ^' G N - 1]}, 

E S A = {(sfi, - ") : i G A,n G Aj,p G [a;]}, 
£ iB = (K^i) :jeB,w£ Bj,p G [x]}, 

We let the edge set E' of G"be£U%U E s a U E tB U E^,. Note are that when k = 3, k A = k B = 1, 
so .Em is empty. Also note that for each p G [x], the edges E 1 ^, form a graph isomorphic to the 
supergraph G. 

For an edge (s P k ,t p - k ) G -E^,, we say that a spanning path P (i.e. a path from s p k to i^ fc of 
length at most k) is a canonical path if it has the form s p k , s^ fc _ 1 ^, . . . , s p v Ui, Wj, t^, t p - 2 , ■ ■ ■ , t^ fe . 
In other words, the path first follows the path of Em edges to s P ± , then uses an edge from E sA to 
get to one of the original Min-Rcp nodes Uj that corresponds to supernode i, then takes an original 
Min-Rep edge to Wj, then an E tB edge out to i^, and then follows Em edges to t p k . Note that 
such a path must exist, or else there are no edges from A4 to Bj, in which case (i,j) would not 
be an edge in the supergraph, which would mean that (s P k ,t P k ) would not be an edge in E v ~. 
Furthermore, any canonical path has length exactly (k A — 1) + 1 + 1 + 1 + (k B — 1) = k, so is indeed 
a valid spanning path. 

Lemma 4.1. In any k-spanner H of G' , every edge in Eq is either included in H or is spanned 
by a canonical path. 

Proof. Suppose this is false, and let e = (s P k , t P k ) be an edge in Eq which is not in H but is 
also not spanned by a canonical path. Let P = s P k = x\,X2, ■ ■ ■ ,x q -i,x q = t p k be the shortest 
simple path in H that does span e (such a path with q < k + 1 must exist since if is a /c-spanner 
of G'). Let U = {s p k : i! G A} U {t p , k : j' G B} be the set of vertices that are incident on edges 
of E p ~ . Let x a be the first vertex in P that is not in U. Such a vertex must exist, since if it does 
not then P is a path of length at most k between s P k and t p k that is contained in E p ~. This is a 
contradiction, since E p ~ is isomorphic to the supergraph G which by assumption has girth at least 
k + 2, while adding e to P would give a cycle of length at most k + 1. 

So x n is the first vertex in P that is not in U. This means that it must be either s p , n 1N or 
tP-n k _ 1 n for some i! G A or j' G B. If it is t p ,, k _^, then P must keep following Em edges to get 
that x a+ k B -2 is t p tl and thus x Q+ /c B _i is w for some w G By . Since a > 2 and k A + k B = k — 1, 
there can be only k A + 1 more vertices on the path. But it is obvious that, if j' 7^ j (which it must 
be since otherwise the path would have already hit i P k ), there is no way to complete the path. 
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If x a = s p ,,, _-.,, then since the intermediate vertices have degree 2 and P is simple it must 
be the case that x a+ k A -2 is s P , v and so x a j r k A -\ is u for some u G Ay. From u the next vertex 
could either be s^, x for some other p' , or could bew 6 -Bj/ for some f G B. If x a+ k A is s^ for 
some p', then the next hop cannot be to a node in Ay, or else we could have gotten to this node 
instead of to u earlier, contradicting our assumption that P is the shortest path. Thus if x a +k A is 
s p tl for some p' , it must be that P follows Em edges backwards to get that x a+ 2k A -i = s^ . Note 
that a > 2 by definition, and 2k A > k — 2, so either x^ or x^+i is s^ /fc . Either one is an obvious 

contradiction, since x^+i is supposed to be t p k , which is not adjacent to s p , k . 

This means that from u the next vertex in P must be w G By for some j' G -B, or equivalently 
that x a _|_fc A = w. Since a > 2 and fc^ + k B = k — 1, there can be at most k B more vertices on P. 
In order to get to t P k via Em edges, it obviously must be the case that j' = j and P is actually 
a canonical path. Otherwise, if the next hop from w is another edge from E then it is back on the 
A side of the Min-Rep graph, which is clearly too far away from t P k to finish the path. Thus P 
must actually be a canonical path. □ 

We will now define some edge sets that will be useful in the next lemma. For each i G A, let 
Ui G Ai be some arbitrarily chosen vertex in Ai, and let E{ = (U u( z a^s^^u)) U (Up 6 r a .i(s|' 1 ,'Uj)). 
Similarly, for each j £ B we arbitrarily choose some node Wj G Bj, and let Ej = (U^g^u),^)) U 
( i Jp^[x]('Wj,t P 1 )). Let .E = (UjgA-Ei) U (UjqbEj). Clearly \E\ = n + xh, since \Ei\ = \Ai\ +x for each 
i G A and |JS,-| = |5^| + x for each j G .B. 

We say that a spanner i? of G' is a proper /c-spanner if it does not include any edge of Eg, 



which by Lemma 4J implies that every edge of Eq is spanned by a canonical path. 



Lemma 4.2. Any k-spanner H for G 1 can be converted in polynomial time into a proper k-spanner 
H' of size at most 6\H\. 

Proof We first let Hi be the edge set (H \ Eg) U£U%ui It is obvious that all edges of G 
are fc-spanned by H\ except for H n Eg: edges in E,Em, and E are self-spanned, edges in E s a 
and EtB are 3-spanned by E, and edges in Eg \ H must have been spanned in H by a canonical 
path (by Lemma |4.1|) , which is still contained in H\. We now claim that H\ is small. Note that 
\V'\=n + x(h/2)k A + x(h/2)k B = n + x(n/2)(k - 1). So 

|.Hi| < \H\ + \E\ + \E M \ + \E\ 

< \H\ + n 2 + x(h/2)(k A - 1) + x{n/2)(k B - 1) + n + hx 

< |-ff| + in + x(h/2)(k — 3) + n + fix 
<\H\ + (\V'\-1) + (\V'\-1) + (\V'\-1) 
<A\H\, 

where we used the fact that k > 3 and the fact that |U| > \V'\ — 1 (since it is connected). 

Now we need to span the edges in HdEg. For each such edge (s P k , t p - k ) there is an associated 
superedge (i, j). We get H' from Hi by adding, for each such edge, the edges {s^u) and (w,^-^) 
for some u £ Ai and w; G -Bj so that (u, w) G E (note that some such edge must exist or else naj) 
is empty). This obviously creates a canonical path that spans (s P k ,t p k ) (since Em f= -Bi) while 
costing at most 2\H\, and thus H' is a valid A:-spanner of size at most 6|iJ| as claimed. □ 
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We now can prove one direction of the reduction: 

Lemma 4.3. Given a k-spanner H for G' , we can construct in polynomial time a REP- cover for 
G of size at most 6\H\/x 



Proof. We first apply Lemma 4^ to get a proper £;-spanner H' of size at most 6\H\. Now for 
every p £ [x] and i £ A let Sf = {u £ A4 : (s^,u) £ E(H')}, and similarly for j £ B let 
T p = {w e Bj : K/Ji) G £(#')}■ Now for each P G M let UP = (UieA^f) U (U jei? rJ). We 
claim that for every p £ [x], the set U p is a valid REP-cover. This is enough to prove the lemma, 
since clearly the smallest U p has size at most 6\H\/x by averaging (each vertex in each U p can be 
charged to the edge in E sA or EtB that caused it to be in U p ). 

Consider a superedge (i,j) £ E. Since H' is a proper /c-spanner, it contains a canonical path 
from s p ik to t p - k . By the definition of canonical path, this implies that it includes the edges 
(s^, it), (u, w), and (w, i^) for some u £ Ai and w £ Bj. Thus u £ Sf and w £ T p with (u, w) £ E. 
Since this is true for every superedge, it implies that U p is a valid REP-cover. □ 

We now want to prove the other direction, that the existence of a small REP-cover for G implies 
a small A;-spanner for G'. 

Lemma 4.4. Given a REP-cover C for G, we can construct in polynomial time a k-spanner H of 
G' with at most (k + l)x\C\ edges. 

Proof. The edge set of our fe-spanner H is 

(0?!, u) :i£A,u£AinC,p£ [x]} U {(tf 1} w) : j £ B,w £ Bj C\C,p £ [x]} UEU E M U E. 

To see that H is a A;-spanner, first note that every edge in E U Em U E is spanned by itself. Lt 
is easy to see that any edge in E s a or EtB is spanned by E (in fact, 3-spanned). And since C is a 
valid REP-cover, for any edge in Eq there is a canonical path included in H. 

Now we need to bound the size of H. Clearly it is at most 

\H\=x\C\ + \E\ + \E M \ + \E\ 

< x\C\ + n 2 + x\A\(k A - 1) + x\B\(k B - 1) + (n + xn) 

< x\C\ +xn + x(n/2)(k A - I) + x(n/2)(k B - 1) + n + xn 

< x\C\ + x\C\ + x{n/2)(k - 3) + xn + xn 

<4x|C|+x|C|^ 

< (k + l)x\C\, 

where we used the fact that \C\ > n and that by definition x = n 2 /n □ 

We can now prove the main theorem of the paper, that it is hard to approximate the basic 
/c-spanner problem. 

Theorem 4.1. Assuming NP % BPTIME(2 polyl ° 9 ^), for any constant e > and 3 < k < 
log ~ e n there is no polynomial time approximation algorithm for the basic k-spanner problem with 
ratio less than 2( 1 °g 1 " e ™)/ fc . 
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Proof. Suppose that we have an a-approximation algorithm for the basic /c-spanner problem. Given 
an instance G of Min-Rep with supergirth larger than k + 1, we reduce it to basic /c-spanner on 
G as described above. If the smallest REP-cover has size n (i.e. we can assign a single label to 



every vertex of the supergraph and get a valid proof), then by Lemma 4.4 there is a /c-spanner 
of G with at most (/c + l)xh edges. On the other hand, if the smallest REP-cover has size at 
least 2( log ™)/( fe + 1 ) • fi then by Lemma 4.3 the smallest /c-spanner of G must have size at least 



2(iog n)/(fc+i) _ ^ x /g_ By Theorem |3.4| we cannot distinguish between these two cases of Min-Rep, 
and thus a > (2^°^ tn ^^ k+ ^hx/Q)/((k + l)xh) = (2( lo s 1_en )/( fe+1 ))/(6(& + 1)). 

However, the n used in the above expression is the size of the Min-Rep instance G, not the 
size of the spanner instance G . Let n' = \V'\ be the size of the /c-spanner instance, and note that 
n' < n + nkx = n + n 2 k < 2kn 2 . So n > y/n' /(2k), and thus we have hardness 

2 (log 1 - £ n)/(fc+l) 2 ( lo S 1 " E ( n '/(2fc)))/(2(fc+l)) 

6(/c + l) - 6(/c + l) 

By using an appropriately smaller e, and switching notation to let n represent the size of the 
/c-spanner instance, this gives hardness of 2^ g n >' as claimed (assuming that k < log _ e n). D 

5 Conclusion 

Motivated by proving hardness for the basic /c-spanner problem (one of the only spanner problems 
for which strong hardness was not already known), we gave a proof that Label Cover and Min-Rep 
are hard to approximate even when restricted to instances with large supergirth. This result has 
been claimed before Jl5[ , but their proof was fundamentally flawed by their attempt to increase the 
girth before using parallel repetition. Our new proof is based on a technique (subsampling edges of 
2-query PCPs) that allows us to sparsify the PCP obtained by parallel repetition enough to destroy 
most small cycles without significantly losing in the soundness of the PCP (and thus the provable 
hardness). This gives a proof that the basic /c-spanner problem, which is perhaps the simplest of 
spanner problems, has superpolylogarithmic hardness. 

Min-Rep with large girth and large gap? An important but perhaps difficult question is if 
Min-Rep is still hard to approximate on instances with large girth (even if the supergirth is 4). 
A solution to this question would be useful in lower bounding problems such as Multicommodity 
Buy-at-Bulk, Multicommodity Cost-Distance (see Hj), and other network design problems, and 
would also lead to simplifications of already known lower bounds. 

It may not be possible to obtain hardness for the problem of Min-Rep with large girth, and in 
fact it may be that this problem has a good approximation. For example, one natural way to try 
to prove this would be to find instances where the girth is at least the supergirth, and then try to 
repeat our reduction. A natural case where the girth is at least the supergirth is if the graph induced 
by Ai U Bj is a matching for each superedge (i,j)- The Label Cover version (i.e. maximization) of 



this is the Unique Games problem, originally defined by Khot [23]. It is easy to see that in this 
case the girth is at least the supergirth, but unfortunately while Khot conjectured the problem 
to be hard to approximate (the famous Unique Games Conjecture), the conjectured hardness is 
still quite small and known upper bounds preclude the kind of strong hardness that we would like 
to prove. But are there other cases where the girth of the Min-Rep graph is large that are more 
difficult to approximate? 
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